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related  data)' of  incumbents  on  a sample  of  jobs  in  terms  of  the  nine  tests 
of  the  General  Aptitude  Test  Battery  (GATB)  of  the  United  Stated  Training 
and  Employment  Service. 

, Since  the  GATB  tests  are  not  available  for  general  use,  the  present 

"tk  ''  study  was  directed  toward  the  use  ot.  the  PAQ  as  the  basis  for  predicting 

test-related  data  for  various  commercially-available  tests  which  were  con- 
sidered to  measure  the  same  'constructs*  as  those  measured  by  the  GATB. 
tests.  Data  were  obtained  for  a sample  of  96  jobs,  the  data  consisting  of  a 
PAQ  analysis  for  each  job  and  data  for  the  incumbents  on  various  commercially 
available  tests.  Depending  on  the  nature  of  the  test  data  available  for 
individual  jobs,  four  types  of  criteria  were  used  as  indications  of  the 
"importance"  of  individual  tests  for  each  job  in  question.  These  criteria 
were:  (1)  the  mean  test  score  of  incumbents  on  the  job;  (2)  a "potential" 
cut-off  score  (which  was  the  score  which  had  actually  been  used  as  a "cut- 
off" for  selection  purposes  by  the  organization  which  supplied  the  data) ; 

(3)  a validity  coefficient;  and  (4)  an  indication  as  to  whether  the  test 
would  be  "valid"  for  the  job.  The  data  for  the  various  jobs  for  which  any 
given  test  had  been  used  were  grouped  into  categories  for  the  "constructs" 
represented  by  the  individual  GATB  tests.  (Adequate  data  were  available  for 
only  five  of  the  nine  constructs.)  The  available  "norms"  of  the  tests  re- 
presenting any  given  construct  were  converted  to  a set  of  standard  scores 
with  a mean  of  100  and  a standard  deviation  of  20  (that  used  with  the  GATB 
tests) . The  PAQ  job  dimension  scores  were  then  used  as  predictors  of  what- 
ever test-related  criteria  were  available  for  any  given  construct. 

The  PAQ-based  predictions  for  the  criteria  of  mean  test  scores  and  cut- 
off score::  were  ail  highly  s:.gnificant.  However,  the  predictions  for  actual 
validity  coefficients  were  understandably  low.  The  prediction  of  whether  or 
not  specific  tests  would  be  'valid"  were  also  significant  (in  75  percent  of 
the  cases  for  which  the  PAQ  predicted  a test  would  be  valid,  the  tests 
indeed  proved  to  be  valid  predictors),  y 

Although  the  predictions  supported  the  utility  of  the  PAQ-based  job- 
component  validity  model,  a number  of  problems  probably  resulted  in  overly 
conservative  predictions.  This  particular  report  is  of  a preliminary 
nature,  based  on  the  data  available  at  the  time.  A later  analysis  will  be 
carried  out  within  a few  months  after  the  sample  size  is  increased. 


Unclassified 

SECURITY  CLASSIFICATION  OF  THIS  PAGE(lFh*n  Dal*  Enfarad; 


TABLE  OF  CONTENTS 


INTRODUCTION 

The  Position  Analysis  Questionnaire  (PAQ) 

Principle  components  analyses  of  the  PAQ. 
Use  of  the  PAQ  to  establish  job  component 
val idi ty 

Objectives  of  the  Present  Study 

METHOD 

Constructs  Used  in  the  Study 

Development  of  Equivalent  Norms 

Actual  Criteria  Used 

Development  of  Predicted  Criterion  Values 

Relating  Predicted  to  Actual  Criterion  Values. 


Page 


1 

2 

2 


RESULTS 


12 


DISCUSSION 


17 


LIST  OF  REFERENCES 


19 


APPENDIX  A:  Jobs  Included  in  the  Sample 20 

APPENDIX  B:  Tests  Used  to  Measure  the  Various 

Constructs 22 


fry  , 


iv 


TABLE  OF  CONTENTS  (Cont.) 

LIST  OF  TABLES 

Table  Page 

1.  Correlations  Between  Predicted  and  Actual  Mean 
Test  Scores,  Cutoff  Scores,  Validity  Coefficients, 

and  Phi  Coefficients  for  Valid  - Non  Valid  Tests 14 

2.  Frequency  Count  of  Correct  and  Incorrect 
Predictions  Only  When  PAQ-based  Data  Predict 

Tests  to  be  Valid  Indicators  of  Job  Performance 16 


I 


1 


INTRODUCTION 


The  conventional  method  for  identifying  personnel  tests  to  be 
used  in  the  selection  of  personnel  for  various  jobs  consists  of 
validation  of  tests  for  each  particular  job  in  question.  This  pro- 
cedure involves:  (1)  the  administration  of  a sample  of  tests  to 

incumbents  who  are  already  on  the  job  in  question  or  to  applicants 
who  are  going  to  be  placed  on  the  job;  (2)  the  obtaining  of  some 
criterion  measure  of  job  performance  for  the  individuals  who  have 
taken  the  tests;  and  (3)  the  analysis  of  the  statistical  rela- 
tionships between  the  test  scores  and  the  criterion  of  job  perform- 
ance. Those  tests  for  which  a significant  relationship  is  found 
between  test  scores  and  job  performance  criterion  values  can  then  be 
used  as  a basis  for  the  selection  of  individuals  for  the  job  in 
question.  As  is  indicated  in  step  1 of  the  above  procedure  there 
are  actually  two  variations  of  the  general  test  validation  method- 
ology. One  of  these  is  a concurrent  procedure,  which  involves  the 
use  of  a sample  of  individuals  actually  on  the  job.  The  other  method, 
which  is  referred  to  as  predictive  validity,  consists  of  the  ad- 
ministration of  the  tests  to  candidates  for  the  job,  and  the  later 
analysis  of  the  relationship  between  test  scores  and  the  criterion 
of  job  performance  after  the  individuals  have  had  sufficient  time  to 
be  able  to  demonstrate  their  job-performance  abilities.  (In  the 
case  of  predictive  validity,  the  test  is  not  used  in  the  actual  selec- 
tion of  the  job  candidates  used  in  the  validation  procedure.) 

These  procedures  are  time  consuming,  and  in  some  instances  are 
not  feasible  at  all,  as  is  the  case,  for  example,  if  the  sample  of 
job  candidates  is  too  small  for  carrying  out  a conventional  validity 
study.  Thus,  over  the  years,  there  have  been  certain  efforts  made 
to  develop  some  type  of  "generalized"  procedure  that  could  be  used 
in  the  development  of  test  batteries,  a procedure  that  would  be  es- 
sentially rooted  in  the  systematic  analysis  of  job  characteristics . 

The  concept  of  a generalized  approach  to  the  establishment  of 
test  batteries  for  personnel  selection  using  information  about  the 
job  obtained  through  systematic  job  analysis  procedures  was  initially 
referred  to  by  Lawshe  (1952)  as  synthetic  validity,  and  was  later 
described  by  Balma  (1959,  p.  359)  as  follows:  "The  inferring  of 

validity  in  a specific  situation  from  a logical  analysis  of  jobs  into 
their  elements,  a determination  of  test  validity  for  those  elements, 
and  a combination  of  elemental  validities  into  a whole."  Since 
the  term  synthetic  validity  has  been  criticized  as  being  not  specifi- 
cally appropriate  to  such  procedure,  McCormick  (1974)  has  suggested 
the  use  of  the  term  job  component  validity. 

The  development  of  a procedure  for  establishing  the  job  compon- 
ent validity  of  predictors  for  jobs  would  consist  of  the  following 
(McCormick,  et  al.,  1972):  (1)  some  method  of  identifying  the  con- 

stituent components  of  jobs  (which  are  referred  to  as  job  elements  by 
Balma);  (2)  a method  for  determining,  for  an  experimental  sample  of 
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jobs,  the  human  attribute  (s)  required  for  successful  job  performance 
when  a given  job  component  is  common  to  several  jobs;  and  (3) 
some  method  of  combining  the  estimates  of  human  attributes  required 
for  individual  job  components  into  an  overall  estimate  of  the  human 
attribute  requirements  for  an  entire  job.  Such  a procedure  would 
make  it  possible  to  "build-up"  the  aptitude  requirements  for  any 
given  job  by:  (1)  knowing  what  job  components  occur  in  the  job  in 

question;  (2)  knowing  what  aptitudes  are  required  for  each  such 
component;  and  (3)  knowing  what  aptitudes  are  required  for  each 
such  component;  and  (4)  having  a procedure  for  summating  the  at- 
tribute requirements  that  are  relevant  to  the  individual  job  com- 
ponents . 


The  Position  Analysis  Questionnaire  (PAQ) 

Various  procedures  have  been  used  in  the  development  of  some 
type  of  job  component  or  generalized  validity  procedure.  One  of 
these  has  involved  the  use  of  the  Position  Analysis  Questionnaire 
(PAQ) . The  PAQ  is  a structured  job  analysis  questionnaire  that  pro- 
vides for  the  analysis  of  individual  jobs  in  terms  of  187  job  ele- 
ments. In  the  analysis  of  jobs  with  the  PAQ  job  elements,  various 
rating  scales  are  used  (the  particular  rating  scale  used  for  each 
job  element  being  the  one  for  which  the  concept  or  the  scale  seems 
particularly  appropriate  to  the  element) . Most  of  the  scales  are 
six-point  Likert-type  scales,  ranging  from  zero  (does  not  apply)  to 
five  (the  highest  value).  The  various  scales  used  include  those 
dealing  with  importance,  time,  extent  of  use,  and  in  some  instances 
special  scales.  In  certain  instances  a dichotomous  scale  is  used. 
The  dichotomous  scale  provides  for  indicating  whether  the  job  ele- 
ment in  question  does,  or  does  not,  apply  to  the  job. 


Principal  components  analyses  of  the  PAQ.  Data  based  on  the 
PAQ  have  been  subjected  to  various  principal  components  analyses. 

Of  particular  relevance  to  our  present  interest  are  those  carried 
out  by  Jeannert  and  McCormick  (June  1969) ,and  by  Marquardt  and 
McCormick  (1974a) . The  first  of  these  was  based  on  a sample  of  536 
jobs,  and  resulted  in  the  identification  of  32  principal  components 
which  are  referred  to  as  job  dimensions.  27  of  these  were  based  on 
the  principal  components  analyses  of  the  job  elements  within  each  of 
the  six  divisions  of  the  PAQ,  and  the  other  five  were  based  on  the 
principal  components  analysis  of  most  of  the  job  elements  of  the  PAQ 
pooled  together.  The  study  by  Marquardt  and  McCormick  was  based  on 
a sample  of  3700  jobs,  and  resulted  in  the  identification  of  30  job 
dimensions  resulting  from  the  principal  components  analyses  of  the 
job  elements  in  each  of  the  six  PAQ  divisions,  and  14  based  on  an 
overall  or  genera1  principal  components  analysis  using  the  pooled 
elements  from  all  six  of  the  PAQ  division. 


Use  of  the  PAQ  to  establish  job  component  validity.  One  of 
the  primary  uses  of  the  PAO  has  been  in  the  framework  of  estab- 
lishing the  job  component  validity  of  tests  for  various  jobs. 

This  has  consisted  primarily  of  the  analysis  of  samples  of  jobs 
for  which  test  data  for  the  job  incumbents  were  available  from  the 
United  States  Training  and  Employment  Service  (USTES) , and  for 
which  PAQ  analyses  were  available.  The  USTES  publishes  test  data 
for  incumbents  on  several  hundred  jobs,  the  test  data  consisting 
of  normative  and  validity  data  for  the  nine  tests  of  the  General 
Aptitude  Test  Battery  (GATB) , These  tests  are  as  follows: 


G - Intelligence 
V - Verbal  Aptitude 
N - Numerical  Aptitude 
S - Spatial  Aptitude 
P - Form  Perception 
0 - Clerical  Perception 
K - Motor  Coordination 
E - Finger  Dexterity 
M - Manual  Dexterity 


As  the  primary  approach  to  the  use  of  PAQ-based  data  in  the  job 
component  validity  framework,  samples  of  jobs  were  selected  which 
"matched"  the  ones  for  which  the  USTES  has  published  test  normative 
and/or  validity  data.  In  these  analyses,  the  primary  criterion  of 
the  "importance"  of  a given  test  to  any  given  job  consisted  of  the 
mean  GATB  test  scores  of  the  incumbents  on  the  individual  jobs. 

This  criterion  was  based  on  the  assumption  that  individuals  tend  to 
"gravitate"  into  jobs  which  are  commensurate  with  their  own  abili- 
ties. Thus,  it  would  be  assumed  that  jobs  for  which  the  incumbents 
have  high  mean  test  scores  on  a given  test  would  require  more  of  the 
quality  measured  by  the  test  than  jobs  for  which  the  incumbents  have 
lower  mean  test  scores.  Using  mean  test  scores  as  a criterion,  the 
scores  on  the  PAQ  job  dimensions  were  then  used  in  a regression  pro- 
cedure for  the  prediction  of  the  mean  GATB  test  scores. 

Two  such  studios  have  been  carried  out.  The  first  of  these, 
by  Mecham  and  McCormick  (1969),  involved  a sample  of  PAQ  analyses 
for  179  positions  which  "matched"  90  jobs  for  which  the  USTES  pub- 
lished test  data.  In  this  instance  the  PAQ  job  dimensions  that  were 
used  as  predictors  were  those  developed  by  Jeanneret  and  McCormick 
(1969).  In  the  second  study  PAQ  analyses  for  659  positions  were 
matched  with  141  jobs  for  which  the  USTES  had  published  test  data 
(Marquardt  and  McCormick  (1974b).  In  the  case  of  both  of  these 
studies  the  prediction  of  the  mean  test  scores  of  the  incumbents 
from  PAQ  job  dimension  scores  was  quite  respectable.  The  ranges  and 
medians  of  the  multiple  correlations  across  the  nine  GATB  tests  re- 
sulting from  these  studies  are  given  below. 


Mecham  and  McCormick 


Correlations 

Range 
Mod  ian 


59 


to  .80 
.71 


Marquardt  and  McCormick 

.46  to  .76 
.73 
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In  both  of  these  studies  the  predictions  of  the  cognitive 
tests  were  best,  those  of  the  perceptual  tests  were  intermediate, 
and  those  of  the  psychomotor  tests  were  the  lowest.  Although 
there  were  differences  in  the  predictions  for  the  various  types 
of  tests,  the  general  level  of  prediction  was  viewed  as  demon- 
strating the  potential  utility  of  the  use  of  such  a procedure  for 
the  establishment  of  the  job  component  validity  of  personnel  se- 
lection tests. 


Objectives  of  the  Present  Study 

The  use  of  PAQ  job  dimension  scores  for  the  prediction  of 
mean  GATB  test  scores  of  ''hypothetical"  samples  of  job  incumbents 
on  various  jobs  clearly  can  give  some  indications  of  the  apti- 
tudes that  presumably  would  be  required  for  individual  jobs.  How- 
ever, since  the  predictions  are  in  terms  of  GATB  test  scores,  and 
since  the ‘GATE  tests  are  not  available  for  use  by  private  organi- 
zations, the  operational  use  of  such  predictions  would  necessitate 
that  the  predictions  in  terms  of  the  nine  GATB  tests  would  have  to 
be  "converted"  into  terms  corresponding  to  those  of  comnercially- 
available  tests.  Thus,  it  would  be  desirable  to  develop  some 
procedure  for  use  of  the  GATB  test  score  predictions  as  the  basis 
for  the  selection  of  corresponding  commercially-avai lable  tests, 
and  for  the  estimation  of  scores  for  such  tests  which  correspond 
to  those  of  the  GATB  tests.  Thus  one  could  use  predictions  of 
appropriate  GATB  test  score  cut-offs  as  the  basis  for  deriving 
estimates  of  cut-ff  scores  on  other  (corresponding)  tests  which 
would  be  comparable  to  those  of  the  GATB  test  in  question.  The 
basic  objective  of  the  present  study  has  been  that  of  developing 
some  procedure  for  shifting  from  the  prediction  of  GATB  test  scores 
to  the  prediction  of  scores  on  commercially-available  tests  that 
presumably  correspond  with  those  of  the  several  GATB  tests. 

There  are  two  possible  general  approaches  to  the  "matching" 
of  GATB  and  commercially-available  tests  that  might  serve  as  the 
basis  for  converting  from  one  to  another.  The  preferable  approach 
would  be  one  for  which  data  are  available  for  two  tests  that  are 
based  on  the  scores  of  the  individuals  in  a "general  population" 
who  have  taken  both  tests.  The  equivalence  of  two  such  tests 
would  best  be  reflected  by  a high  correlation  between  the  two. 

In  turn,  corresponding  norms  for  the  two  tests  preferably  should  be 
available  for  the  general  population,  in  order  to  make  it  possible 
to  "convert"  scores  from  one  test  to  equivalent  scores  on  another 
test  in  terms  of  either  standard  deviation  units  from  the  mean,  or 
in  terms  of  percentile  norms.  The  USTES  has  published  data  on 
certain  commercial  tests  that  have  been  administered  to  the  same 
samples  of  individuals  who  have  taken  certain  of  the  GATB  tests. 
These  data,  however,  were  found  not  to  be  particularly  useful  for 
this  study,  since  many  of  the  tests  for  which  such  data  were  pre- 
sented were  those  more  typically  used  in  educational  circumstances 
rather  than  for  personnel  selection.  Also  many  of  the  samples  of 
individuals  represented  in  the  normative  data  consisted  of  students 
or  of  individuals  on  given  occupations  rather  than  of  the  "general 
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population."  In  most  instances  normative  data  were  simply  not 
available. 

The  second  approach  is  one  in  which  a judgment  needs  to  be 
made  about  the  equivalence  of  test  content,  or  the  equivalence  of 
the  "construct"  that  presumably  is  being  measured  by  the  two  tests 
in  question.  This  is  admittedly  a subjective  evaluation,  and  there- 
fore needs  to  be  approached  with  caution.  In  the  case  of  some  pairs 
of  tests  there  is  no  particular  problem  in  making  a reasonably  valid 
judgment  about  their  equivalence,  but  in  the  case  of  other  tests 
the  subjective  judgment  may  not  be  entirely  valid.  In  those  in- 
stances where  tests  are  considered  to  be  equivalent,  there  is  of 
course  the  further  possible  problem  of  conversion  of  norms  from  one 
test  to  those  for  the  second  test.  Many  of  the  norms  presented  in 
test  manuals  are  for  individuals  on  certain  jobs  or  in  various  job 
groupings,  without  there  being  norms  available  for  what  might  be 
viewed  as  a "general"  population.  (It  might  be  added  that  the  norms 
for  the  GATB  test  have  been  based  on  a sample  of  4000  individuals 
whose  jobs  are  reasonably  representative  of  the  major  occupational 
groups  of  workers  in  the  labor  market.  Therefore,  the  application 
of  this  approach  preferably  would  require  the  availability  of  a 
reasonably  comparable  set  of  norms  for  any  other  test  that  would  be 
considered  as  being  essentially  "equivalent"  to  one  of  the  GATB 
tes  ts .) 


METHOD 

The  primary  focus  of  this  study,  then,  was  to  develop  some  way 
of  testinq  the  utility  of  a job-component  validity  model,  based  on 
PAQ  job  dimensions,  for  use  with  commercially-available  tests.  To 
accomplish  this  it  would  be  necessary  to  translate  the  predictions 
for  the  GATB  tests  made  by  combining  the  PAQ  job  dimension  scores 
into  terms  relevant  to  commercially-available  tests  representing 
similar  aptitude  constructs.  Therefore,  this  study  was  viewed  as 
a test  of  the  generalizability  of  the  PAQ-based  job-component  valid- 
ity model  that  has  been  heretofore  tested  only  with  GATB  test  data. 

The  basic  approach  used  has  been  that  of  obtaining,  from  vari- 
ous organizations,  data  from  any  validity  studies  that  they  had 
carried  out  for  jobs  in  their  organizations,  as  well  as  obtaining 
PAQ  analyses  for  any  such  jobs.  Several  approaches  were  used  in 
an  attempt  to  obtain  such  data,  including  the  following: 

1.  Direct  mailing  of  letters  to  several  hundred  organizations, 
explaining  the  goals  of  the  project  and  asking  them  to 
submit  any  relevant  test  data  they  had  available  (validity 
information  as  well  as  normative  data) , and  askinq  them  to 
arrange  for  the  analysis  of  jobs  in  question  with  the  PAQ. 

2.  Establishing  contacts  with  various  test  publishing  firms, 
asking  them  if  they  had  any  validity  or  normative  data  on 
any  of  the  tests  which  they  published  for  incumbents  on 
specific  jobs. 
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3.  Mailings  to  consulting  firms  which  it  was  believed 
were  involved  in  test  validation  studies. 

4.  Mailings  to  former  graduate  students  of  Purdue  Univer- 
sity. 

5.  Appeals  made  in  certain  publications  which  it  was  felt 
had  the  audiences  that  it  might  be  useful  to  contact, 
asking  if  anyone  had  the  kinds  of  data  we  were  seeking. 
These  appeals  included  articles  in  The  Industrial  Psy- 
chologist (TIP)  (the  newsletter  of  the  Division  of 
Industrial/Organizational  Psychology  of  the  American 
Psychological  Association) , and  the  Personnel  Adminis- 
trator , which  is  the  official  organ  of  the  American 
Society  for  Personnel  Administration. 

Although  the  combination  of  all  of  these  sources  yielded 
test  data  for  incumbents  for  a moderate  number  of  jobs,  there  was 
still  a problem  in  the  case  of  certain  jobs.  In  the  case  of  cer- 
tain jobs  for  which  test  data  were  available  the  organization  or 
individual  furnishing  the  test  data  were  unable  or  unwilling  to  ar- 
range for  the  analysis  of  those  jobs  with  the  PAQ.  In  order  to 
include  these  jobs  in  our  sample,  it  was  decided  to  "match"  these 
jobs  with  jobs  which  had  already  been  analyzed  with  the  PAQ  and 
which  v/ere  in  the  PAQ  data  bank  (presently  consisting  of  some  20,000 
jobs) . This  matching  was  carried  out  on  the  basis  of  job  code 
numbers  from  the  Dictionary  of  Occupational  Titles  (D.O.T.),  which, 
although  admittedly  imperfect,  seemed  to  provide  a reasonable  basis 
for  matching  jobs.  Even  after  augmenting  the  sample  in  this  way, 
however,  the  total  sample  still  consisted  of  only  some  58  jobs.  This 
was  not  seen  as  a large  enough  sample  to  allow  for  any  meaningful 
analyses.  Thus,  other  attemps  to  obtain  relevant  normative  and/or 
validity  data  were  made. 

In  an  attempt  to  enlarge  the  sample  of  jobs,  some  archival 
data  we re  used.  Such  data  consisted  mainly  of  validity  or  normative 
data  for  various  tests  for  incumbents  on  different  jobs  as  reported 
in  sources  such  as  the  Validity  Information  Exchange  of  Personnel 
Psychology , The  Handbook  of  Employee  Selection  (Dorcus  and  Jones, 
1950),  and  the  manuals  for  various  tests  commonly  used  in  industry. 
The  PAQ  analyses  for  the  jobs  for  which  data  were  obtained  through 
these  sources  had  to  be  obtained  by  matching  these  jobs  with  jobs 
in  the  PAQ  data  bank.  As  before,  this  was  done  on  the  basis  of 
D.O.T.  code  numbers.  This  archival  data  yielded  data  on  an  addi-  ^ 
tional  38  jobs,  bringing  the  total  sample  for  this  study  to96  jobs. 

* It  should  be  noted  that  obtaining  test  data  and  PAG  analyses  for 
jobs  so  they  could  be  included  in  this  study  was  the  major  diffi- 
culty encountered  in  the  project.  The  present  sample  took  almost  2 
years  to  collect,  and  new  approaches  to  data  collection  are  cur- 
rently going  on.  It  was  felt  that  a preliminary  report  should  at 
least  be  prepared  to  describe  this  line  of  research,  but  it  is  hoped 
that  present  efforts  will  allow  a more  extensive  analysis  later  with 
a much  larger  sample. 


With  this  final  sample  of  96  jobs  for  which  there  were  avc ' 1- 
able  test  data  from  either  an  organization  or  various  archival 
sources,  as  well  as  a PAQ  analysis  for  each  job,  the  following 
operations  were  carried  out: 

1.  The  oommercially-available  tests  for  which  data  were  now 
available,  had  to  be  matched  to  individual  GATB  tests 
which  were  judged  to  measure  the  same  "constructs . ” 

2.  A method  had  to  be  developed  which  would  allow  the 
equating  of  the  norms  for  the  commercially-available 
tests  with  the  norms  of  the  corresponding  GATB  tests. 

3.  Analyses  would  be  carried  out  relating,  for  each  job, 

the  PAQ  predicted  GATB  test  data,  to  the  data  on  the  com- 
mercially-available test  in  question.  This  comparison 
would  revolve  around:  (1)  the  mean  test  scores;  (2) 

what  were  referred  to  as  "cut-off "scores  (which  in  most 
instances  consisted  of  the  test  score  one  standard  de- 
viation below  the  mean  of  the  scores  of  the  incumbents 
on  the  job  in  question);  (3)  validity  coefficients; 
and  (4)  the  determination  of  whether  or  not  the  test 
would  be  a "valid"  predictor  of  performance  “or  the  job 
in  question. 

Although  the  total  sample  consisted  of  96  jobs,  a small 
portion  of  the  sample  was  based  on  data  for  jobs  in  each  of 
seven  clusters  which  had  been  formed  by  one  organization. 

(These  clusters  had  been  used  in  a previous  test  validation 
study  the  organization  had  carried  out.)  PAQ  analyses  were 
available  for  certain  of  the  jobs  within  each  cluster,  but 
not  for  all  jobs.  The  test  data  for  the  incumbents  of  all  of 
the  jobs  within  each  cluster  were  not  differentiated  by  specif- 
ic job.  Thus,  it  was  not  possible  to  relate  PAQ  analyses  for 
the  individual  jobs  with  the  test  data  for  the  incumbents  on 
those  same  identical  jobs.  In  view  of  this  a "composite" 

PAQ  analyses  were  then  used  as  the  "predictors"  of  the  test 
data  for  the  incumbents  on  the  jobs  in  the  clusters. 

In  view  of  the  special  treatment  of  the  data  for  the  jobs 
in  these  seven  clusters  a complete  set  of  analyses  was  also 
carried  out  for  all  of  the  jobs  excluding  these.  This  set  of 
analyses  then  was  based  on  a subsample  of  89  jobs  (96  minus 
the  7)  . 
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Constructs  bsecj  in  the  Study 

As  was  mentioned  above,  this  study  required  relating  data  on 
a number  of  commercially-available  tests  to  the  GATB  tests.  This 
presented  a bit  of  a problem  in  that  the  factorial  composition  of 
apparently  "similar"  tests  is  sometimes  quite  different,  and  thus 
two  tests  of,  for  example,  verbal  ability,  may  actually  be  testing 
somewhat  different  abilities  or  attributes.  Recognizing  this  prob- 
lem, it  was  nonetheless  decided  that  the  only  feasible  way  that 
this  study  could  be  carried  out  would  be  to  consider,  as  measures  of 
the  same  "construct,"  all  of  the  commercially-available  tests  that 
purport  to  measure  the  same  construct  as  that  represented  by  any 
given  GATB  test.  Although  the  actual  GATB  tests  were  not  available, 
there  is  information  published  concerning  the  general  nature  of 
each  subtest.  Thus,  for  example,  a test  of  "verbal  aptitude"  that 
consisted  primarily  of  reading  comprehension  items  would  not  be 
included  as  a test  of  the  "Verbal  Aptitude"  construct  as  measured 
by  the  verbal  (V)  GATB  test  used  in  this  study  to  represent  that 
construct . 

Since  the  entire  framework  of  this  study  revolved  around  the 
GATB,  it  was  only  natural  that  the  constructs  which  would  be  of 
interest  would  be  the  nine  measured  by  the  subtests  of  the  GATB, 
these  being  General  Intelligence  (G) , Verbal  Aptitude  (V),  Numeri- 
cal Aptitude  (N) , Spatial  Aptitude  (S) , Form  Perception  (P) , Cleri- 
cal Perception  ((,)),  Motor  Coordination  (K)  , ringer  Dexterity  (F)  , 
and  Manual  Dextcritv  (mi  The  commero i a 1 Lv-ava i 1 abie  tests  used 
to  measure  each  construct,  anti  the  number  of  jobs  for  which  each 
such  test  was  used  are  given  in  Appendix  R. 


Development  of  Equivalent  Norms 

The  determination  of  which  commercially-available  tests 
measured  each  of  the  constructs  in  question,  although  basically 
a judgemental  question,  did  not  pose  any  serious  problems.  Once 
the  individual  tests  had  been  classified  as  measuring  a specific 
construct,  it  was  then  necessary  to  develop  a method  of  equating 
scores  on  each  of  the  commercially-available  tests  used  to  measure 
the  construct  to  test  scores  on  the  GATB  subtests  for  that  same 
construct.  The  optimal  procedure  for  accomplishing  this  would  have 
been  to  have  available  normative  test  data  for  a single,  general 
population  on  all  the  tests  within  a particular  construct.  Such 
data  were  not  available,  however,  and  other  methods  had  to  be  em- 
ployed. These  methods  involved  the  combining  and  synthesizing  of 
general  norm  groups,  and  eventually  all  test  scores  were  expressed 
in  the  same  standard  score  units,  those  units  based  on  the  standard 
score  distribution  reported  for  the  GATB  tests.  The  GATB  standard 
scores  are  based  on  a mean  of  100  and  a standard  deviation  of  20. 

A more  detailed  oxpl  mat  ion  of  the  methods  employed  to  develop 
these  equi valor  norms  is  given  in  Appendix  C. 

As  a result,  for  any  construct,  it  was  possible  to  locate, 
for  each  job  for  which  test  data  for  a test  measuring  that  con- 
struct were  available,  the  position  on  the  continuum  of  scores  on 
the  construct  where  the  sample  of  incumbents  on  that  job  would 
fall.  It  was  flic  score;;  of  these  incumbents  on  the  different  con- 
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structs  that  w c-e  us-  I as  n i terion  values  in  this  study,  such 
criterion  value  .or  arious  jobs  beina  viewed  as  reflecting  the 
relative  "importance"  to  the  jobs  in  question  of  the  construct  in 
question . 


Actual  Criteria  Used 

Four  different  criteria  as  related  to  individual  jobs  were 
used  in  the  study,  these  being  considered  as  reflectina  various  in- 
dices of  the  "importance"  to  the  individual  jobs  of  each  of  the 
nine  constructs  as  represented  by  the  GATB  tests.  These  criteria 
for  each  job  and  in  each  test,  consisted  of:  (1)  mean  test  score 

of  the  job  incumbents  on  the  individual  job;  (2)  a potential 

cut-off  score;  (3)  a coefficient  validity;  and  (4)  an  indi- 
cation of  whether  the  test  would  be  "valid"  for  the  job.  The  first 
two  of  these  wore  considered  to  be  the  primary  criteria  used  in 
the  study.  Considereing  for  a moment  the  criterion  of  mean  test 
scores  of  incumbents  on  the  individual  jobs,  one  could  view  a con- 
tinuum for  each  of  the  nine  constructs  expressed  in  standard  score 
form  with  the  mean  scores  of  incumbents  on  the  various  jobs  fall- 
ing in  various  positions  alona  that  continuum, f rom  low  to  high. 

(As  indicated  oar] ier,  the  conversion  of  the  n rms  of  the 
commercial ly-ava i lab ' o tests  to  the  standard  score  forms  of  the 
GATB  tests  served  as  .he  common  metric  for  relating  the  mean  scores 
of  incumbents  on  that  continuum.)  In  the  case  of  certain  jobs  the 
cut-off  score  criterion  consisted  of  scores  that  were  actually 
used  as  cut-off  scores  in  selecting  people  for  the  jobs  in  question 
by  the  organizations  which  had  provided  the  test  data.  In  the 
case  of  most  jobs,  however,  this  was  a "potential"  cut-off  score 
which  was  one  standard  deviation  below  the  mean  of  the  scores  of 
incumbents  on  the  job  in  question. 

In  the  case  of  the  criterion  of  validity  coefficients  it  was 
of  course  not  necessary  to  be  concerned  about  the  "normative" 
data  that  were  used  with  the  criteria  of  mean  test  scores  and  pre- 
dicted cut-off  scores.  Rather,  for  any  given  construct,  the  co- 
efficients of  validity  of  the  tests  which  were  considered  to  repre- 
sent that  construct  could  be  viewed  as  representing  a continuum 
from  low  to  high  as  expressed  by  the  actual  coefficient  values  them- 
selves. In  the  case  of  certain  analyses  the  fourth  criterian  was 
used,  namely  an  indication  as  to  whether  individual  tests  would  be 
"valid"  predictors  of  performance.  This  criterion  was  based  on  the 
question  as  to  whether  the  initial  coefficient  of  validity  for  that 
tost  itself  was  one  which  was  statistically  significant  or  not. 


Development  of  Predicted  Criterion  Values 

The  predicted  criterion  values  for  the  individual  jobs  were 
derived  from  a standard  computer  printout  of  data  that  are  printed 
from  the  PAQ  analysis  of  any  given  job.  These  computer  printouts 
are  based  on  previous  analyses  of  PAO-based  data  as  related  to  the 
published  USTF.S  test  data  mentioned  previously.  Such  data  include, 
for  the  each  o)  the  sample  jobs, and,  for  each  GATB  test , estimates  of 
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the  mean  test  scor<-  of  iob  incumb'-nts,  the  standard  deviation  of 
the  distribution  of  test  scores  of  incumbents  on  each  job  (to  be 
used  to  establish  a "cut-off”  score),  and  the  validity  coefficient. 
The  computer  printout,  based  on  regression  analyses  of  PAQ  job 
dimension  scores  as  related  to  these  three  values,  provides  esti- 
mates, for  any  given  job,  of  the  first  three  criteria.  In  con- 
nection with  the  fourth  criterion  the  computer  program  on  which 
these  computer  printouts  are  based  also  includes  provision  for 
making  a prediction  about  those  tests  (usually  three)  which  would 
be  valid  predictors  for  performance  on  the  job  in  question.  This 
particular  aspect  of  the  program,  in  effect,  is  a "policy  captur- 
ing" procedure  that  parallels  the  practice  of  the  USTES  in  its 
approach  to  the  identification  of  the  three  "best"  or  most  "valid" 
tests  for  use  in  the  selection  of  individuals  for  any  given  job, 
and  in  establishing  cut-off  scores  for  those  three  tests.  The  PAQ 
printout,  in  effect,  provides  estimates  of  the  cut-off  scores  of 
those  three  tests.  Thus,  for  any  given  job,  the  fourth  criterion 
consists  of  the  identi f ication  of  three  tests,  which  are  predicted 
to  bo  the  most  "valid"  for  use  in  selecting  people  for  any  qi'fen 
iob,  based  on  WRTFC  nracticcs  in  this  area.  Thus,  in  the  ca1"1  of 
the  three  tests  identified  by  the  computer  program  as  being  most 
"valid"  for  any  given  construct,  a determination  would  be  made  as 
to  whether  that  test  in  the  actual  validity  setting,  did  in  fact 
turn  out  to  have  a significant  validity  coefficient  for  the  job 
in  question. 


Relating  Predicted  to  Actual  Criterion  Values 

As  implied  above,  the  predictions  of  the  four  different 
criteria  used  in  the  study  were  derived  from  the  conventional 
PAQ  computer  printout  for  the  individual  jobs  used  in  the  case  of 
the  analysis  of  any  given  construct.  In  these  predictions  there 
was  of  course  the  initial  "selection"  of  jobs  for  which  relevant 
test  data  were  available  for  the  incumbents  as  related  to  any 
given  construct.  In  addition,  given  those  jobs  for  which  test 
data  for  a given  construct  were  available,  there  was  a further 
selection,  for  individual  analyses,  of  the  types  of  test  criterion 
data  which  were  actually  available.  Thus,  for  any  given  construct, 
a job  would  be  included  in  the  analysis  for  any  particular  cri- 
terion, depending  upon  whether  actual  criteria  data  were  available  - 
such  as  mean  test  scores,  cut-off  scores,  validity  coef f icients , 
or  an  indication  as  to  whether  a coefficient  was  or  was  not  valid. 
Thus,  the  analyses  consisted  of  a series  of  sub-analyses  for  the 
individual  constructs,  each  sub-analysis  consisting  of  data  for 
jobs  for  which  both  predicted  and  actual  criterion  data  were  avail- 
able . 

In  this  process  the  actual  (or  obtained)  test  values  for  in- 
cumbents on  any  job,  were  converted  to  standard  scores  on  the 
constructs  involved.  The  predicted  scores  were  all  in  terms  of 
c.ATB  tests,  but,  since  the  constructs  were  defined  in  terms  of  the 
GATB  subtests,  and  since  all  scores  were  in  a common  metric,  this 
provided  no  problem.  The  actual  predicted  scores  themselves  were 
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the  result  of  combininq  the  PAQ  job  dimension  scores  for  any  jok  . 
according  to  regression  equations,  developed  in  earlier  research 
designed  to  predict  GATB  test  scores  from  PAQ  job  dimension  scores 
(Mecham  and  McCormick,  1969).  These  equations  were  designed  to 
yield  mean  test  scores,  potential  cut-off  scores,  validity  coeffi- 
cients, as  well  as  make  some  prediction  about  which  tests  should 
be  valid  predictors  of  performance  for  a job.  These  predictions 
were  then  compared  to  the  actual  criterion  data  in  these  areas  by 
means  of  a series  of  correlational  analyses. 


RESULTS 


For  each  of  the  nine  aptitude  constructs  represented  by  the 
nine  tests  of  the  General  Aptitude  Test  Battery  (GATB) , Pearson 
product-moment  correlations  were  computed  for  each  of  the  following 
sets  of  data,  the  analyses  in  each  instance  being  based  on  those 
jobs  for  which  relevant  test  data  and  criterion  data  were  available. 

(1)  Predicted  GATB  mean  test  scores  (as  derived  by  procedures 
involving  job  analysis  data  from  the  PAQ)  and  actual  mean 
test  scores  obtained  for  incumbents  on  each  of  the  jobs 

in  the  sample, 

(2)  Predicted  GATB  cut-off  scores  (scores  one  standard  de- 
viation below  the  predicted  mean  test  scores)  and  (in 
the  case  of  a few  organizations)  actual  cut-off  scores 
which  had  been  set  for  each  of  the  jobs  in  the  sample 
(the  actual  cut-off  scores  were  not  necessarily  one  stan- 
dard deviation  below  the  mean) , 

(3)  Predicted  validity  coefficients  (obtained  from  PAQ  job 
analysis  procedures). 

in  addition  to  the  Pearson  product-moment  correlations,  a 
phi  coefficient  was  computed  for  each  of  the  nine  GATB  aptitude 
constructs  between  the  predicted  validity  of  a particular  test 
(valid=l,  not  valid  =0)  as  derived  from  PAQ  procedures,  and  the 
actual  validity  of  the  tests  (valid=l,  not  valid=0)  as  obtained  in 
actual  validation  procedures  carried  out  by  organizations  providing 
data  for  the  present  study.  The  Pearson  product-moment  correlatins 
as  well  as  the  phi  coefficients  computed  for  each  of  the  nine  GATB 
aptitude  constructs  are  presented  in  Table  1. 

One  should  note  that  no  test  data  were  available  for  either 
the  Form  Perception  or  Motor  Coordination  constructs.  In  the  case 
of  Finger  Dexterity  and  Manual  Dexterity,  data  were  available  on  only 
seven  jobs,  these  being  the  seven  job  clusters  mentioned  earlier  in 
this  section.  Data  concerning  these  two  ability  areas  probably 
should  be  considered  as  essentially  meaningless,  because  of  tie  ana)  1 
sample  size. 

The  results  reported  in  Table  1 would  seem  to  indicate  a 
substantial  relationship  between  PAQ  predictions  concerning  mean 
test  scores  and  cut-off  scores  obtained  for  the  jobs  in  the  sample. 
When  considering  the  relationship  between  predicted  and  actual  mean 
test  scores  for  those  ability  areas  with  sufficiently  large  sample 
sizes  to  warrant  consideration,  five  of  five  correlations  are  signi- 
ficant at  the  .03  level  or  better.  The  correlations  range  from 
.30  (Spatial  Ability)  to  .68  (Clerical  Ability).  The  results  con- 
cerning predicted  and  actual  cut-off  scores  are  similar.  Again, 
when  considering  only  those  ability  areas  with  adequate  sample  sizes, 
four  of  four  correlations  are  significant  at  the  .03  level  or  better. 
The  correlations  range  from  .28  (Intelligence)  to  .70  (Verbal 
Aptitude) . 
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The  relationship  between  predicted  and  .ctual  validity  co- 
efficients is  not  as  promising,  however.  Only  one  of  the  five  cor- 
relations is  significant  at  the  .05  level,  with  it  being  in  the 
opposite  direction  to  what  would  be  normally  expected.  The  correla- 
tions range  from  -.36  (Verbal  Aptitude)  to  .19  (Numerical  Apti- 
tude) . The  data  relating  to  the  predicted  versus  actual  validity  or 
non-validity  of  the  particular  tests  show  considerably  stronger  re- 
lationships than  for  the  validity  coefficients  themselves.  These 
coefficients  ranged  from  -.15  (Intelligence)  to  .53  (Spatial  Ap- 
titude) . It  would  thus  seem  that  PAQ-based  data  were  relatively 
successful  in  predicting  whether  or  not  particular  tests  would 
prove  to  be  valid  predictors  of  job  performance. 

One  explanation  of  the  relatively  (moderately)  low  phi  co- 
efficients obtained  in  the  present  study  is  that  PAQ-based  data  are 
conservative  in  their  prediction  of  the  validity  or  non-validity  of 
tests'.  Predictions  from  PAO  data  would  thus  have  a tendency  to  pre- 
dict as  invalid  a number  of  tests  which  might  actually  prove  to 
be  valid  indicators  of  job  performance.  As  a result,  a frequency 
count  was  made  of  only  those  cases  where  the  PAO  data  predicted  that 
a particular  test  would  bo  valid.  In  Table  2 are  given  the  number 
of  correct  valid  predictions  and  the  number  of  predictions  which 
were  incorrect,  as  well  as  the  percent  of  predictions  made  which 
were  correct.  Note  that  considering  all  five  ability  areas  where 
such  data  were  available,  75  percent  of  the  cases  in  which  tests 
were  predicted  to  be  valid,  they  were  indeed  valid  predictors  of 
job  performance.  These  results,  taken  together  with  the  phi  coef- 
ficient data,  would  seem  to  suggest  that,  if  anything,  predictions 
based  upon  PAQ  data  are  conservative  in  nature,  and  are  relatively 
accurate  in  their  prediction  of  valid  indicators  of  job  perform- 
ance . 
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TABLE  1 


Correlations  Between  Predicted  and  Actual 
Mean  Test  Scores,  Cutoff  Scores,  Validity 
Coefficients,  and  Phi  Coefficients  for 
Valid  - Non  Valid  Tests 


Pearson  Product-Moment  Correlations:  Criterion  of  Mean  Test  Scores 

Total  Sample  Subsample  N=89 


Test 

r 

significance 

N 

r 

significance 

N 

G- Intel 1 iqance 

. 32 

.011 

49 

CO 

• 

.001 

42 

V-Verbal  Aptitude 

. 52 

.001 

34 

.48 

.001 

34 

N-Numerical  Aptitude 

. 54 

. 001 

69 

. 56 

. 001 

62 

S-Spatial  Aptitude 

. 30 

. 030 

39 

.29 

.050 

32 

P-Form  Perception 

★ 

★ 

★ 

★ 

* 

★ 

O-Clerical  Perception 

.68 

. 001 

31 

. 68 

.001 

31 

K-Motor  Coordination 

★ 

* 

* 

* 

* 

* 

F-Fingcr  Dexterity 

. 02 

.480 

7 

* 

* 

k 

M-Manual  Dexterity 

.67 

.050 

7 

k 

★ 

k 

Pearson  Product-Momen 

t Correlations:  Criterion  of  Cutoff  Scores 
Total  Sample  Subsample  N=89 

Test 

r 

significance 

N 

r 

significance 

N 

C- Intel 1 iqence 

.28 

. 028 

47 

.39 

.006 

40 

V-Vcrbal  Aptitude 

.70 

.001 

34 

.70 

.001 

34 

N-Numerical  Aptitude 

.61 

. 001 

45 

.64 

.001 

38 

S-Spatial  Aptitude 

.53 

.024 

14 

.53 

.110 

7 

P-Form  Perception 

* 

k 

* 

* 

* 

k 

CH'ler ical Perception 

* 

* 

k 

k 

k 

k 

K-Motor  Coordination 

* 

* 

k 

* 

k 

k 

F-Finger  Dexterity 

.20 

. 329 

1 

k 

k 

k 

M-Manual  Dexterity 

.23 

. 305 

7 

k 

k 

k 

★ 

Insufficient  number  of  cases  for  analysis 
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TABLE  1 (Con't.) 


Pearson  Product-Moment  Correlations:  Criterion  of  Vhlidi ty  Coefficients 


Total  Sample 

Subsample  N=89 

Test 

r 

significance 

N 

r significance 

G- Intelligence 

.26 

.191 

13 

V-Vorbal  Aptitude 

. 36 

. 022 

31 

N-Numerical  Aptitude 

. 19 

. 072 

56 

Same  as 

S-Spatial  Aptitude 

.15 

.204 

29 

P-Form  Perception 

•k 

* 

for  total 

O-Clerical  Perception 

.16 

. 182 

34 

sample 

K-Motor  Coordination 

■k 

★ 

00 

F-Finger  Dexterity 

k 

* 

00 

M-Manual  Dexterity 

k 

★ 

00 

Phi  Coefficients:  Cri 

terion  of  Valid  v= 

! Non- 

-valid  tests 

Total  Sample 

Subsample  N=89 

Test 

r 

significance 

N 

r significance 

C-Intelligence 

.15 

. 303 

13 

V-Verbal  Aptitude 

.49 

.003 

31 

N-Numerical  Aptitude 

. 30 

.011 

56 

Same  as 

S-Spatial  Aptitude 

.53 

.001 

29 

P-Form  Perception 

* 

★ 

for  total 

9-Clerical  Perception 

.01 

.485 

34 

sample 

K-Motor  Coordination 

* 

★ 

F-Finger  Dexterity 

* 

* 

M-Manual  Dexterity 

* 

* 

Insufficient  number  of  cases  for  analysis 
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TABLE  2 

Frequency  Count  of  Correct  and  Incorrect 
Predictions  Only  When  PAQ-based  Data  Predict 
Tests  to  be  Valid  Indicators  of  Job  Performance 


Test 

No.  Correct 
Predictions 

No.  Incorrect 
Predictions 

Percent 

Correct 

Predictions/ 

Incorrect 

Correct 

G- Intelligence 

9 

3 

75 

V-Verbal  Aptitude 

7 

0 

100 

N-Numerical  Aptitude 

15 

6 

71 

S-Spatial  Aptitude 

3 

1 

68 

P-Form  Perception 

* 

* 

* 

Q-clerical  Perception 

13 

6 

68 

K-Motor  Coordination 

* 

* 

* 

F-Finger  Dexterity 

* 

* 

* 

M-Manual  Dexterity 

* 

* 

* 

All  Tests  Together 

47 

16 

75 

* 

Insufficient  number  of  cases  for  analysis 
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DISCUSSION 


It  would  seem,  based  on  the  results  reported  here,  that  PAQ- 
based  data  can  indeed  serve  as  the  basis  of  a job  component  vali- 
dity model  as  reflected  by  the  fact  that  there  is  a reasonable 
relationship  between  test  score  data  predicted  by  the  PAQ,  and 
actual  test  score  data  for  job  incumbents  resulting  from  actual 
test  validity  studies.  Although  data  were  not  available  in  all  the 
areas  tested  by  the  GATB , for  those  areas  for  which  adequate  data 
were  available,  there  were  significant  relationships  between  PAQ  pre- 
dicted and  obtained  mean  test  scores,  and  between  predicted  and  ob- 
tained cut-off  scores.  Furthermore,  for  the  tests  which  were  pre- 
dicted to  be  valid  by  the  PAQ,  in  only  25  percent  of  the  cases  were 
the  tests  not  actually  reported  to  be  valid.  These  results  are 
clearly  indicative  of  the  utility  of  PAO-based  data  in  a job  com- 
ponent validity  model  in  and  of  themselves,  and,  when  all 

the  data  are  taken  into  consideration,  the  support  is  rather  im- 
pressive, especially  considering  certain  obvious  shortcomings  in  the 
available  data. 

In  considering  these  possible  shortcomings  for  example,  the 
regression  equations  used  to  predict  test  data  were  not  based  on 
commercially-available  tests  such  as  were  used  in  this  study,  but 
were  based  on  GATB  data.  Although  the  tests  used  in  this  study  were 
matched  with  the  corresponding  GATB  tests  as  well  as  possible,  it  is 
obvious  that  all  tests  classified  as  representing  the  same  construct 
are  not  necessarily  measuring  the  same  "identical"  construct.  Thus, 
we  have  a regression  equation  derived  to  predict  scores  on  one  test 
of,  say,  verbal  aptitude,  and  we  are  using  that  regression  equation 
to  preduct  test  data  based  on  other  tests  of  "verbal  aptitude"  which 
may  at  least  be  somewhat  different  in  content.  The  magnitude  of 
the  obtained  relationships  becomes  even  more  impressive  in  part  be- 
cause of  the  possible  "slippage"  from  this  possible  disparity. 

There  are  other  problems  as  well.  The  process  of  equating  the 
norms  for  the  different  tests  may  also  have  added  some  error  vari- 
ance into  the  prediction  system,  although  this  probably  did  not  play 
a major  role  in  reducing  the  predictability  of  test  scores.  A 
much  more  serious  problem  probably  stems  from  the  very  nature  of  the 
various  validity  studies  which  provided  the  data  which  we  were  try- 
ing to  predict.  Of  course,  there  was  no  control  on  the  design  of 
the  validity  studies  which  served  as  the  sources  of  the  criterion 
test  data,  and,  not  only  did  the  quality  of  the  studies  seem  to  vary 
somewhat,  but  the  criteria  used  in  the  different  studies,  and  all 
studies  had  been  carried  out  in  the  same  fashion,  it  miqht  have  been 
easier  to  predict  the  outcome  from  the  PAO-based  equations. 

Unfortunately,  these  (and  perhaps  other)  problems  are  typical 
of  those  which  involve  the  collection  of  already  existing  research 
data  from  a variety  of  organizations.  (The  data  obtained  from  rats 
in  a laboratory  certainly  can  be  obtained  under  much  more  neatly 
"controlled"  conditions  than  obtaining  human  data  from  the  real 
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world  of  work.)  The  nature  of  this  project  required  the  collec- 
tion of  test  validity  and/or  normative  data  from  various  organiza- 
tions, and  it  would  be  inevitable  that  the  adequacy  of  the  various 
test  validation  studies  used  would  have  been  variable,  and  that  the 
criteria  used  in  them  would  have  varied.  The  collective  effect  of 
some  of  these  problems  and  of  their  effects  on  the  summarizing  and 
analysis  of  the  data  would  be  expected  to  reduce  the  strength  of 
the  actual  relationships  between  the  predictors  and  the  criterion 
values,  thus  representing  a conservative  estimate  of  the  predict- 
ability of  the  PAQ-based  data.  In  the  light  of  these  constraints, 
the  results  seem  generally  to  give  further  support  to  the  potential 
use  of  a structured  job  analysis  procedure  as  the  basis  for  estab- 
lishing personnel  specifications  for  jobs. 

This  particular  report  is  presented  as  a preliminary  report 
covering  relevant  data  that  were  available.  Efforts  are  being  made 
to  obtain,  for  additional  jobs,  relevant  test  data  and  PAQ  analyses 
toward  the  end  of  building  up  a larger  sample  for  a subsequent 
analysis  within  the  next  several  months.  In  connection  with  the 
analysis  it  is  expected  that  a more  recently-developed  set  of  PAQ 
job  dimensions  will  be  used,  specifically  a set  that  would  be  based 
on  a larger  and  more  representative  sample  of  jobs.  It  is  also 
hoped  that  data  for  an  expanded  sample  of  jobs  would  include  test 
validity  and/or  normative  data  for  a wider  range  of  commercially- 
available  tests,  thus  providing  a basis  for  greater  "generalization 
of  PAQ-based  job  data  for  the  estimation  of  personnel  requirements 
of  jobs  using  the  basic  job  component  validity  model. 
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APPENDIX  A 

Jobs  Included  in  the  Sample 


Jobs  for  which  data  were  obtained  directly  from  various 
organizations : 


Job  Title 


D.O.T.  Number* ** 


Feeder  Catcher 

General  Factory  Worker 

Roll  Catcher 

Slotter 

Utility 

Stripper 

Apprentice  Steelworker 
Bundler  Stacker 

Oil  Maintenance  and  Operator  (trainee) 
Oil  Assistant  Operator 
Business  Forms  Pressmen 
Special  Officer  (Dept.  Store) 

Saleclerk 

Pipefitter  (trainee) 

Oil  Refinery  Process  Trainee 

Computer  Operator 

General  Clerk 

Billing  Checker 

Requisition  Handler 

Order  Filler 

Stock  Assistant 

Typist 

Shipping  Packer 
Shipping  Checker 
Sorter 

Returned  Goods  Receiving  Clerk 

Material  Handler 

Induction  Clerk 

Bookkeeper 

Keypunch  Operator 

Calculating  Machine  Operator 

Record  Clerk 

Correspondence  Clerk 

Timekeeper 

Buyer's  Assistant 

Receiver  Checker 

Secondary  Receiver 

Industrial  Truck  Driver 


619.886 

899.381 
643.986 
651.782 

922.883 

749.887 
★ ★ 

221 . 388 
* * 

542.280 
* * 

* * 

290.478 

862.381 

452.280 
213.282 

209.388 
209.688 

221.388 

922.887 

223.387 
112.800 

227.587 
209.688 

222.687 

222 .387 

929.887 

161.688 

210.388 
213.582 
216 .488 

222.587 
204.288 

219.388 
223.368 
222.687 
222.587 

905.883 


There  v/ere  also  several  clusters  of  jobs  which  have  no  particular 
title  since  they  are  actually  composites  of  several  jobs  included 
in  this  sample.  These  clusters  resulted  from  various  research  ef- 
forts on  the  part  of  the  organizations,  and  were  treated  as  in- 
dividual jobs. 


*Code  number  from  the  Dictionary  of  Occupational  Titles 

**The  title  as  furnished  by  the  company  covers  several  related 
D.O.T.  classifications. 
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Jobs  for  which  data  were  obtained  from  journals,  test  manuals, 
etc . : 

Job  Title  D.O.T.  Number* 


Electrician 

824.281 

Bank  Clerk 

209.388 

Bookkeeping  Machine  Operator 

215.388 

Secretary 

201.368 

Meter  Reader 

239.588 

Mail  Clerk 

231.588 

Shipping  Clerk 

222.587 

File  Clerk 

206.388 

Gas  Serviceman 

637.281 

Yard  Clerk 

910.388 

Industrial  Engineer 

012.188 

Electrical  Technician 

003.181 

Packer 

920.887 

Machinist 

638.281 

Subscription  Clerk 

209.488 

Telephone  Service  Representative 

249.368 

Aircraft  Inspector 

619.381 

Truck  Driver 

905.883 

Telephone  Operator 

235.862 

Plant  Worker 

355.878 

Assembler 

726.781 

Coding  Clerks 

219.388 

Aircraft  Manufacturing  Foremen 

621.131 

Draftsman 

005.281 

Accountant 

160.188 

Insurance  Sales  Representative 

250.258 

Programmer 

007.187 

Claims  Adjustor 

241.168 

Accounting  Clerk 

219.488 

Computer  Operator 

213.382 

Stenographer 

207.388 

Receptionist 

237.368 

Bank  Teller 

212.368 

Reservations  Clerk 

912.368 

Typist 

112.800 

Clerk 

105.000 

Keypunch  Machine  Operator 

213.582 

Policeman 

375.268 

* 


Code  number  from  the  Dictionary  of  Occupational  Titles 


APPENDIX  B 


Tests  Used  to  Measure  The  Various  Constructs 


I ntel 1 igence 

Wonderlic  - 21  cases 

Otis  Test  of  Mental  Ability  - 3 cases 
Test  of  Learning  Ability  - 6 cases 
SRA  Adaptability  Test  - 2 cases 
Special  (or  in  house)  - 7 cases 

Verbal  Aptitude 

SRA-Verbal  - 7 cases 
PTT-Verbal  - 7 cases 
SET-Verhnl  - 2 cases 
EAS- Verbal  - 13  cases 

Numerical  Aptitude 

SRA-Numer ical  (or  Arithmetic  Form)  - 3 cases 
EAS-Numer ical  - 14  cases 
SET-Numerical  - 3 cases 

PTI-Numerical  - 7 cases 
Arithmetic  Fundamentals  - 3 cases 
Arithmetic  Reasoning  Test  - 1 case 
FIT-Arithmetic  - 4 cases 
Special  (or  in-house)  - 31  cases 

Spatial  Aptitude 

Minnesota  Paper  Forms  Board  - 2 cases 
EAS-Spatial  - 5 cases 
SRA-Asscmbly  - 1 case 
Special  (or  in-house)  - 30  cases 

Clerical  Percept ion 

SET-Clerical  - 3 cases 
Special  (or  in-house)  - 


I 


24  cases 
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APPENDIX  C 
Development  of  Norms 

In  order  to  carry  out  this  study,  it  was  necessary  to  have 
all  test  data  in  some  common  framework  - that  is,  all  test 
scores  had  to  be  expressed  in  terms  of  some  common  scale.  In 
order  to  do  this,  it  would  have  been  desirable  to  have  data  on 
all  the  tests  used  in  this  study  for  the  same  population.  This 
was  clearly  not  possible.  Therefore,  some  alternative  strategy 
had  to  be  formulated.  Since  the  tests  of  the  General  Aptitude 
Tests  Battery  (GATB ) of  the  United  States  Training  and  Employ- 
ment Service  were  used  as  the  criterion  tests  in  other  studies 
involving  the  PAO  in  a job  component  validity  model  (Mecham  and 
McCormick , 1969 ; Marquardt  and  McCormick,  1973),  it  was  decided 
that  the  norms  for  the  GATB  would  be  utilized  as  the  basic  frame- 
work. The  GATB  tests  are  reported  in  standard  score  form,  with 
a mean  of  100  and  a standard  deviation  of  20,  these  values  being 
derived  from  test  data  on  over  20,000  people  collected  by  the 
United  States  Training  and  Employment  Service. 

The  major  problem,  however,  was  with  the  other  tests  that 
would  be  used  in  this  study.  Although  it  was  not  very  difficult 
to  assign  any  given  test  to  an  appropriate  construct,  since  the 
"constructs"  utilized  in  this  study  were  of  such  a broad  nature 
(for  example,  it  seems  obvious  that  the  Short  Employment  Test  of 
Verbal  Ability  should  be  classified  as  a "Verbal  Aptitude"  test^ , 
there  is  no  body  of  data  indicating  the  norms  for  these  tests  on  a 
"general"population.  Not  only  were  the  different  tests  used  with 
different  populations,  but  for  most  tests  there  was  not  a single 
population  that  could  be  considered  as  a "general  working  popula- 
tion" for  which  norms  were  supplied.  Since  that  is  basically 
what  the  GATB  norms  are  based  on,  it  seemed  desirable  to  have  all 
other  norms  based  on  the  same  kind  of  population. 

The  first  problem,  then,  was  to  construct  general  norms  for 
each  of  the  different  tests.  The  seriousness  of  this  problem 
varied  for  the  different  tests.  For  example,  for  the  Employee 
Aptitude  Survey  Tests,  norms  are  provided  for  a general  working 
population,  and  so  these  norms  were  used.  For  the  Otis  Test  of 
Mental  Ability,  norms  are  provided  for  a general  population,  but 
these  norms  are  given  separately  for  males  and  females,  and  so 
they  had  to  be  consolidated  into  a single  set  of  norms.  For  the 
various  tests  published  by  the  Psychological  Corporation  (viz. 

The  Short  Employment  Tests  and  the  Personnel  Tests  for  Industry) , 
no  general  norms  are  provided.  Rather,  there  are  norms  provided 
for  separate  occupational  groups.  In  general,  the  number  of 
occupational  groups  for  which  such  norms  were  provided  were  too 
numerous  to  be  reasonably  consolidated,  so  these  norms  were 
sampled.  That  is,  a number  of  groups  which  seemed  to  have 
"high"  norms  were  combined  with  a number  of  groups  which  seemed 
to  have  "low"  norms,  yielding  a single  set  of  norms  which  it  was 
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felt  provided  a fair  representation  of  a norm  for  a "general" 
population . 

Thus,  for  each  test  a single  set  of  norms  was  constructed, 
recoqnizing  that  the  populations  deemed  "general"  were  different 
for  each  test.  Again,  there  was  no  way  to  avoid  this  lack  of  a 
single  "general"  population  for  whom  scores  were  known  on  all 
tests  used.  The  test  scores  which  were  obtained  from  either  the 
organizations  involved  in  the  study,  or  any  of  the  other  methods 
discussed  earlier,  for  each  job,  were  then  standardized  on  the 
general  norms  established  for  the  corresponding  GATB  test.  This 
resulted  in  a standard  score  for  each  test,  for  the  incumbents  on 
each  job,  reflecting  where  mean  test  for  incumbents  on  that  job 
fell  on  the  particular  construct  relative  to  all  other  jobs. 

The  problem  still  remained  of  converting  all  these  scores  to 
some  common  metric.  As  was  pointed  out  above,  the  GATB  norms  were 
to  be  used  as  the  framework  for  this  common  metric.  Since  one  set 
of  standard  scores  is  always  directly  convertible  to  any  other  set 
of  standard  scores,  the  scores  for  each  job  were  then  converted 
into  standard  scores  with  a mean  of  100  and  a standard  deviation  of 
20.  For  example,  if  the  mean  score  for  a sample  of  plumbers  hap- 
pened to  be  48  on  the  Revised  Minnesota  Paper  Forms  Board,  and 
this  was  found  to  be  equal  to  a standard  score  of  .50  based  on  the 
general  population  norms  constructed  for  that  test,  conversion  to 
the  GATB  norms  was  simply  a matter  of  multiplying  the  GATB  standard 
deviation  of  20  by  .5,  (10),  and  adding  this  to  the  GATB  norm  mean 

of  100  (since  the  standard  score  was  positive),  yielding  a GATB 
standard  score  of  110.  This  process  was  repeated  for  each  job  and 
each  test  so  that  the  final  product  was  a continuum  for  each 
"construct,"  v/ith  the  mean  scores  of  incumbents  on  the  various  jobs 
being  assigned  positions  on  this  continuum  based  on  standard  scores 
with  a mean  of  100  and  a standard  deviation  of  20.  It  was  these 
converted  scores  that  were  used  as  the  criterion  values  of  "mean 
test  scores"  used  in  this  study.  Thus,  since  the  PAQ  based  predic- 
tions are  in  terms  of  the  GATB  tests,  and  the  criterion  tests 
scores  for  the  various  jobs  were  in  GATB  standard  score  terms,  mean 
test  score  values  for  the  two  could  be  directly  compared  for  any 
job.  A somewhat  similar  procedure  was  used  in  deriving  the  predicted 
cut-off  score  criterion  values,  which  were  one  standard  deviation 
below  the  mean  of  the  scores  of  incumbents  on  the  individual  jobs. 
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